3 research outputs found
The Workflow Trace Archive: Open-Access Data from Public and Private Computing Infrastructures -- Technical Report
Realistic, relevant, and reproducible experiments often need input traces
collected from real-world environments. We focus in this work on traces of
workflows---common in datacenters, clouds, and HPC infrastructures. We show
that the state-of-the-art in using workflow-traces raises important issues: (1)
the use of realistic traces is infrequent, and (2) the use of realistic, {\it
open-access} traces even more so. Alleviating these issues, we introduce the
Workflow Trace Archive (WTA), an open-access archive of workflow traces from
diverse computing infrastructures and tooling to parse, validate, and analyze
traces. The WTA includes million workflows captured from
computing infrastructures, representing a broad diversity of trace domains and
characteristics. To emphasize the importance of trace diversity, we
characterize the WTA contents and analyze in simulation the impact of trace
diversity on experiment results. Our results indicate significant differences
in characteristics, properties, and workflow structures between workload
sources, domains, and fields.Comment: Technical repor
Simplified workflow simulation on clouds based on computation and communication noisiness
Many researchers rely on simulations to analyze and validate their researched methods on Cloud infrastructures. However,
determining relevant simulation parameters and correctly instantiating them to match the real Cloud performance is a difficult and
costly operation, as minor configuration changes can easily generate an unreliable inaccurate simulation result. Using legacy values
experimentally determined by other researchers can reduce the configuration costs, but is still inaccurate as the underlying public Clouds
and the number of active tenants are highly different and dynamic in time. To overcome these deficiencies, we propose a novel model that
simulates the dynamic Cloud performance by introducing noise in the computation and communication tasks, determined by a small set of
runtime execution data. Although the estimating method is apparently costly, a comprehensive sensitivity analysis shows that the
configuration parameters determined for a certain simulation setup can be used for other simulations too, thereby reducing the tuning cost
by up to 82.46 percent, while declining the simulation accuracy by only 1.98 percent on average. Extensive evaluation also shows that our
novel model outperforms other state-of-the-art dynamic Cloud simulation models, leading up to 22 percent lower makespan inaccuracyThis work was supported by the ASPIDE Project funded by the European Union’s Horizon 2020 Research and Innovation Programme under Grant agreement No. 801091